AITopics | attention operation

bounds through a sunlit park wearing a yellow sweater prompt a joyful Corgi with a fluffy coat and perky a young woman with curly hair and a bright smile

Neural Information Processing SystemsJun-14-2026, 15:07:47 GMT

Video diffusion transformers have achieved remarkable progress in high-quality video generation, but remain computationally expensive due to the quadratic complexity of attention over high-dimensional video sequences. Recent acceleration methods enhance the efficiency by exploiting the local sparsity of attention scores; yet this the problem, y often struggle we propose with V accelerating ORTA, an acceleration the long-range frame computati work with on. T tw o o address novel components: (1) a sparse attention mechanism that efficiently captures long-range dependencies, and (2) a routing strategy that adaptively replaces full 3D attention with specialized sparse attention variants. VORTA achieves an end-to-end speedup 1 grate .76 with without various loss other of quality acceleration on VBench.

machine learning, natural language, vorta, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Scalable In-context Ranking with Generative Models

Neural Information Processing SystemsJun-11-2026, 13:18:08 GMT

In-context Ranking (ICR) is an emerging paradigm for Information Retrieval (IR), which leverages contextual understanding of LLMs by directly incorporating the task description, candidate documents, and the query into the model's input prompt and tasking the LLM to identify relevant document(s). While it is effective, efficiency is a significant challenge in this paradigm, especially as the candidate list grows due to quadratic/super-linear scaling of attention operation with context length. To this end, this paper first identifies inherent and exploitable structures in the attention of LLMs finetuned for ICR: (1) inter-document block sparsity: attention is dense within each document block but sparse across different documents in the context; and (2) query-document block relevance: the attention scores from certain query tokens to a document block in middle layers strongly correlate with that document's actual relevance. Motivated by these observations, we introduce BlockRank (Blockwise In-context Ranking), a novel method that adapts the attention operation in an LLM by (a) architecturally enforcing the observed inter-document block sparsity, reducing attention complexity from quadratic to linear without loss in performance, and (b) optimizing query-document block relevance for true relevant documents during fine-tuning using an auxiliary contrastive training objective, improving retrieval in attention. Experiments on BEIR, MSMarco and NQ with Mistral-7B demonstrate that BlockRank Mistral matches or outperforms existing SOTA listwise rankers and controlled fine-tuned baseline while being significantly more efficient at inference (4.7x for 100 MSMarco documents in context) and scaling gracefully to long-context shortlists, around 500 documents in-context (approximately 100K context length) within a second, presenting a scalable and effective solution for ICR.

artificial intelligence, large language model, natural language, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

13d7f172259b11b230cc5da8768abc5f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 00:51:08 GMT

hard-attention transformer, opération, transformer, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Software (0.66)

Add feedback

103303dd56a731e377d01f6a37badae3-Paper.pdf

Neural Information Processing SystemsFeb-18-2026, 22:31:32 GMT

architecture, attention module, module, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

103303dd56a731e377d01f6a37badae3-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 12:17:39 GMT

autola, cbam, hoga, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.30)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

Large-Scale In-Game Outcome Forecasting for Match, Team and Players in Football using an Axial Transformer Neural Network

Horton, Michael, Lucey, Patrick

arXiv.org Artificial IntelligenceNov-25-2025

Football (soccer) is a sport that is characterised by complex game play, where players perform a variety of actions, such as passes, shots, tackles, fouls, in order to score goals, and ultimately win matches. Accurately forecasting the total number of each action that each player will complete during a match is desirable for a variety of applications, including tactical decision-making, sports betting, and for television broadcast commentary and analysis. Such predictions must consider the game state, the ability and skill of the players in both teams, the interactions between the players, and the temporal dynamics of the game as it develops. In this paper, we present a transformer-based neural network that jointly and recurrently predicts the expected totals for thirteen individual actions at multiple time-steps during the match, and where predictions are made for each individual player, each team and at the game-level. The neural network is based on an \emph{axial transformer} that efficiently captures the temporal dynamics as the game progresses, and the interactions between the players at each time-step. We present a novel axial transformer design that we show is equivalent to a regular sequential transformer, and the design performs well experimentally. We show empirically that the model can make consistent and reliable predictions, and efficiently makes $\sim$75,000 live predictions at low latency for each game.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Artificial Intelligence

2511.1873

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages Andy Y ang University of Notre Dame David Chiang University of Notre Dame Dana Angluin Y ale University

Neural Information Processing SystemsOct-9-2025, 19:01:24 GMT

A key technique in these proofs is the use of B-RASP, which, like RASP (Weiss et al., 2021), is a small programming language that compiles into transformers.

hard-attention transformer, opération, transformer, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Software (0.66)

Add feedback

Sharing Key Semantics in Transformer Makes Efficient Image Restoration

Neural Information Processing SystemsOct-9-2025, 18:30:28 GMT

To address these challenges, we propose boosting IR's performance by sharing the

dataset, semanir, zhang, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > California > Merced County > Merced (0.04)
Europe > Bulgaria (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Education (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Auto Learning Attention Benteng Ma

Neural Information Processing SystemsOct-2-2025, 02:12:15 GMT

Attention modules have been demonstrated effective in strengthening the representation ability of a neural network via reweighting spatial or channel features or stacking both operations sequentially. However, designing the structures of different attention operations requires a bulk of computation and extensive expertise.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

Fair comparison and ablation study

Neural Information Processing SystemsOct-2-2025, 02:12:05 GMT

The results on CIFAR10 were listed in Table R1. It reveals that HOGA searched by AutoLA (k=4)) still outperforms SE and CBAM by a large margin. We further customized SE and CBAM using the group split operation (denoted by "HOG"), resulting in a specific The HOGA searched by AutoLA outperforms its randomly search counterparts (denoted by "Rand"). We tested the generalization ability of HOGA searched on ResNet56 (denoted by "AutoLA_56") WiderResNet, indicating the consistent superiority of the HOGA searched by AutoLA over previous attention methods. We also compared AutoLA with SE and CBAM on a larger backbone (e.g., The results in Table R3 suggest that AutoLA still outperforms other attention modules.

artificial intelligence, cbam, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.30)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

Filters

Collaborating Authors

attention operation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

bounds through a sunlit park wearing a yellow sweater prompt a joyful Corgi with a fluffy coat and perky a young woman with curly hair and a bright smile

Scalable In-context Ranking with Generative Models

13d7f172259b11b230cc5da8768abc5f-Paper-Conference.pdf

103303dd56a731e377d01f6a37badae3-Paper.pdf

103303dd56a731e377d01f6a37badae3-AuthorFeedback.pdf

Large-Scale In-Game Outcome Forecasting for Match, Team and Players in Football using an Axial Transformer Neural Network

Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages Andy Y ang University of Notre Dame David Chiang University of Notre Dame Dana Angluin Y ale University

Sharing Key Semantics in Transformer Makes Efficient Image Restoration

Auto Learning Attention Benteng Ma

Fair comparison and ablation study